Search Results: "dom"

21 March 2024

Ravi Dwivedi: Thailand Trip

This post is the second and final part of my Malaysia-Thailand trip. Feel free to check out the Malaysia part here if you haven t already. Kuala Lumpur to Bangkok is around 1500 km by road, and so I took a Malaysian Airlines flight to travel to Bangkok. The flight staff at the Kuala Lumpur only asked me for a return/onward flight and Thailand immigration asked a few questions but did not check any documents (obviously they checked and stamped my passport ;)). The currency of Thailand is the Thai baht, and 1 Thai baht = 2.5 Indian Rupees. The Thailand time is 1.5 hours ahead of Indian time (For example, if it is 12 noon in India, it will be 13:30 in Thailand). I landed in Bangkok at around 3 PM local time. Fletcher was in Bangkok that time, leaving for Pattaya and we had booked the same hostel. So I took a bus to Pattaya from the airport. The next bus for which the tickets were available was at 7 PM, so I took tickets for that one. The bus ticket cost was 143 Thai Baht. I didn t buy SIM at the airport, thinking there must be better deals in the city. As a consequence, there was no way to contact Fletcher through internet. Although I had a few minutes call remaining out of my international roaming pack.

Our accommodation was near Jomtien beach, so I got off at the last stop, as the bus terminates at the Jomtien beach. Then I decided to walk towards my accommodation. I was using OsmAnd for navigation. However, the place was not marked on OpenStreetMap, and it turned out I missed the street my hostel was on and walked around 1 km further as I was chasing a similarly named incorrect hostel on OpenStreetMap. Then I asked for help from two men sitting at a caf . One of them said he will help me find the street my hostel is on. So, I walked with him, and he told me he lives in Thailand for many years, but he is from Kuwait. He also gave me valuable information. Like, he told me about shared hail-and-ride songthaews which run along the Jomtien Second Road and charge 10 Baht for any distance on their route. This tip significantly reduced our expenses. Further, he suggested me 7-Eleven shops for buying a local SIM. Like Malaysia, Thailand has 24/7 7-Eleven convenience stores, a lot of them not even 100 m apart. The Kuwaiti person dropped me at the address where my hostel was. I tried searching for a person in-charge of that hostel, and soon I realized there was no reception. After asking for help from locals for some time, I bumped into Fletcher, who also came to this address and was searching for the same. After finding a friend, I felt a sigh of relief. Adjacent to the property, there was a hairdresser shop. We went there and asked about this property. The woman called the owner, and she also told us the required passcodes to go inside. Our accommodation was in a room on the second floor, which required us to put a passcode for opening. We entered the passcode and entered the room. So, we stayed at this hostel which had no reception. Due to this, it took 2 hours to find our room and enter. It reminded me of a difficult experience I had in Albania, where me and Akshat were not able to find our apartment in one of the hottest days and the owner didn t know our language. Traveling from the place where the bus dropped me to the hostel, I saw streets were filled with bars and massage parlors, which was expected. Prostitutes were everywhere. We went out at night towards the beach and also roamed around in 7-Elevens to buy a SIM card for myself. I got a SIM for 7 day unlimited internet for 399 baht. Turns out that the rates of SIM cards at the airport were not so different from inside the city.

In terms of speaking English, locals didn t know English at all in both Pattaya and Bangkok. I normally don t expect locals to know English in a non-English speaking country, but the fact that Bangkok is one of the most visited places by tourists made me expect locals to know some English. Talking to locals is an integral part of travel for me, which I couldn t do a lot in Thailand. This aspect is much more important for me than going to touristy places. So, we were in Pattaya. Next morning, Fletcher and I went to Tiger park using shared songthaew. After that, we planned to visit Pattaya Floating market which is near the Tiger Park, but we felt the ticket prices were higher than it was worth. Fletcher had to leave for Bangkok on that day. I suggested him to go to Suvarnabhumi Airport from the Jomtien beach bus terminal (this was the route I took the last day in opposite direction) to avoid traffic congestion inside Bangkok, as he can follow up with metro once he reaches the airport. From the floating market, we were walking in sweltering heat to reach the Jomtien beach. I tried asking for a lift and eventually got successful as a scooty stopped, and surprisingly the person gave a ride to both of us. He was from Delhi, so maybe that s the reason he stopped for us. Then we took a songthaew to the bus terminal and after having lunch, Fletcher left for Bangkok.

Next day I went to Bangkok, but Fletcher already left for Kuala Lumpur. Here I had booked a private room in a hotel (instead of a hostel) for four nights, mainly because of my luggage. This costed 5600 INR for four nights. It was 2 km from the metro station, which I used to walk both sides. In Bangkok, I visited Sukhumvit and Siam by metro. Going to some areas require crossing the Chao Phraya river. For this, I took Chao Phraya Express Boat for going to places like Khao San road and Wat Arun. I would recommend taking the boat ride as it had very good views. In Bangkok, I met a person from Pakistan staying in my hotel and so here also I got some company. But by the time I met him, my days were almost over. So, we went to a random restaurant selling Indian food where we ate some paneer dish with naan and that restaurant person was from Myanmar.

For eating, I mainly relied on fruits and convenience stores. Bananas were very tasty. This was the first time I saw banana flesh being yellow. Mangoes were delicious and pineapples were smaller and flavorful. I also ate Rose Apple, which I never had before. I had Chhole Kulche once in Sukhumvit. That was a little expensive as it costed 164 baht. I also used to buy premix coffee packets from 7-Eleven convenience stores and prepare them inside the stores.

My booking from Bangkok to Delhi was in Air India flight, and they were serving alcohol in the flight. I chose red wine, and this was my first time having alcohol in a flight.

Notes

In this whole trip spanning two weeks, I did not pay for drinking water (except for once in Pattaya which was 9 baht) and toilets. Bangkok and Kuala Lumpur have plenty of malls where you should find a free-of-cost toilet nearby. For drinking water, I relied mainly on my accommodation providing refillable water for my bottle.

Thailand seemed more expensive than Malaysia on average. Malaysia had discounted price due to the Chinese New year.

I liked Pattaya more than Bangkok. Maybe because Pattaya has beach and Bangkok doesn t. Pattaya seemed more lively, and I could meet and talk to a few people as opposed to Bangkok.

Chao Phraya River express boat costs 150 baht for one day where you can hop on and off to any boat.

14 March 2024

Freexian Collaborators: Monthly report about Debian Long Term Support, February 2024 (by Roberto C. S nchez)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In February, 18 contributors have been paid to work on Debian LTS, their reports are available:

Abhijith PA did 10.0h (out of 14.0h assigned), thus carrying over 4.0h to the next month.

Adrian Bunk did 13.5h (out of 24.25h assigned and 41.75h from previous period), thus carrying over 52.5h to the next month.

Bastien Roucari s did 20.0h (out of 20.0h assigned).

Ben Hutchings did 2.0h (out of 14.5h assigned and 9.5h from previous period), thus carrying over 22.0h to the next month.

Chris Lamb did 18.0h (out of 18.0h assigned).

Daniel Leidert did 10.0h (out of 10.0h assigned).

Emilio Pozuelo Monfort did 3.0h (out of 28.25h assigned and 31.75h from previous period), thus carrying over 57.0h to the next month.

Guilhem Moulin did 7.25h (out of 4.75h assigned and 15.25h from previous period), thus carrying over 12.75h to the next month.

Holger Levsen did 0.5h (out of 3.5h assigned and 8.5h from previous period), thus carrying over 11.5h to the next month.

Lee Garrett did 0.0h (out of 18.25h assigned and 41.75h from previous period), thus carrying over 60.0h to the next month.

Markus Koschany did 40.0h (out of 40.0h assigned).

Roberto C. S nchez did 3.5h (out of 8.75h assigned and 3.25h from previous period), thus carrying over 8.5h to the next month.

Santiago Ruano Rinc n did 13.5h (out of 13.5h assigned and 2.5h from previous period), thus carrying over 2.5h to the next month.

Sean Whitton did 4.5h (out of 0.5h assigned and 5.5h from previous period), thus carrying over 1.5h to the next month.

Sylvain Beucler did 24.5h (out of 27.75h assigned and 32.25h from previous period), thus carrying over 35.5h to the next month.

Thorsten Alteholz did 14.0h (out of 14.0h assigned).

Tobias Frost did 12.0h (out of 12.0h assigned).

Utkarsh Gupta did 11.25h (out of 26.75h assigned and 33.25h from previous period), thus carrying over 48.75 to the next month.

Evolution of the situation In February, we have released 17 DLAs. The number of DLAs published during February was a bit lower than usual, as there was much work going on in the area of triaging CVEs (a number of which turned out to not affect Debia buster, and others which ended up being duplicates, or otherwise determined to be invalid). Of the packages which did receive updates, notable were sudo (to fix a privilege management issue), and iwd and wpa (both of which suffered from authentication bypass vulnerabilities). While this has already been already announced in the Freexian blog, we would like to mention here the start of the Long Term Support project for Samba 4.17. You can find all the important details in that post, but we would like to highlight that it is thanks to our LTS sponsors that we are able to fund the work from our partner, Catalyst, towards improving the security support of Samba in Debian 12 (Bookworm).

Thanks to our sponsors Sponsors that joined recently are in bold.

Platinum sponsors:

TOSHIBA (for 102 months)

Civil Infrastructure Platform (CIP) (for 70 months)

Gold sponsors:

Roche Diagnostics International AG (for 113 months)

Linode (for 107 months)

Babiel GmbH (for 96 months)

Plat Home (for 96 months)

CINECA (for 70 months)

University of Oxford (for 52 months)

Deveryware (for 39 months)

VyOS Inc (for 34 months)

EDF SA (for 23 months)

Silver sponsors:

Domeneshop AS (for 117 months)

Nantes M tropole (for 112 months)

Univention GmbH (for 103 months)

Universit Jean Monnet de St Etienne (for 103 months)

Ribbon Communications, Inc. (for 97 months)

Exonet B.V. (for 87 months)

Leibniz Rechenzentrum (for 81 months)

Minist re de l Europe et des Affaires trang res (for 64 months)

Cloudways by DigitalOcean (for 54 months)

Dinahosting SL (for 52 months)

Bauer Xcel Media Deutschland KG (for 46 months)

Platform.sh SAS (for 46 months)

Moxa Inc. (for 40 months)

sipgate GmbH (for 37 months)

OVH US LLC (for 35 months)

Tilburg University (for 35 months)

GSI Helmholtzzentrum f r Schwerionenforschung GmbH (for 27 months)

Soliton Systems K.K. (for 24 months)

Bronze sponsors:

Evolix (for 118 months)

Seznam.cz, a.s. (for 118 months)

Intevation GmbH (for 115 months)

Linuxhotel GmbH (for 115 months)

Daevel SARL (for 113 months)

Bitfolk LTD (for 112 months)

Megaspace Internet Services GmbH (for 112 months)

NUMLOG (for 112 months)

Greenbone AG (for 111 months)

WinGo AG (for 111 months)

Ecole Centrale de Nantes - LHEEA (for 107 months)

Entr ouvert (for 102 months)

Adfinis AG (for 99 months)

GNI MEDIA (for 94 months)

Laboratoire LEGI - UMR 5519 / CNRS (for 94 months)

Tesorion (for 94 months)

Bearstech (for 85 months)

LiHAS (for 85 months)

Catalyst IT Ltd (for 80 months)

Supagro (for 75 months)

Demarcq SAS (for 74 months)

Universit Grenoble Alpes (for 60 months)

TouchWeb SAS (for 52 months)

SPiN AG (for 49 months)

CoreFiling (for 44 months)

Institut des sciences cognitives Marc Jeannerod (for 39 months)

Observatoire des Sciences de l Univers de Grenoble (for 36 months)

Tem Innovations GmbH (for 31 months)

WordFinder.pro (for 30 months)

CNRS DT INSU R sif (for 29 months)

Alter Way (for 22 months)

Institut Camille Jordan (for 11 months)

11 March 2024

Joachim Breitner: Convenient sandboxed development environment

I like using one machine and setup for everything, from serious development work to hobby projects to managing my finances. This is very convenient, as often the lines between these are blurred. But it is also scary if I think of the large number of people who I have to trust to not want to extract all my personal data. Whenever I run a cabal install, or a fun VSCode extension gets updated, or anything like that, I am running code that could be malicious or buggy. In a way it is surprising and reassuring that, as far as I can tell, this commonly does not happen. Most open source developers out there seem to be nice and well-meaning, after all.

Convenient or it won t happen Nevertheless I thought I should do something about this. The safest option would probably to use dedicated virtual machines for the development work, with very little interaction with my main system. But knowing me, that did not seem likely to happen, as it sounded like a fair amount of hassle. So I aimed for a viable compromise between security and convenient, and one that does not get too much in the way of my current habits. For instance, it seems desirable to have the project files accessible from my unconstrained environment. This way, I could perform certain actions that need access to secret keys or tokens, but are (unlikely) to run code (e.g. `git push`, `git pull` from private repositories, `gh pr create`) from the outside , and the actual build environment can do without access to these secrets. The user experience I thus want is a quick way to enter a development environment where I can do most of the things I need to do while programming (network access, running command line and GUI programs), with access to the current project, but without access to my actual `/home` directory. I initially followed the blog post Application Isolation using NixOS Containers by Marcin Sucharski and got something working that mostly did what I wanted, but then a colleague pointed out that tools like `firejail` can achieve roughly the same with a less global setup. I tried to use `firejail`, but found it to be a bit too inflexible for my particular whims, so I ended up writing a small wrapper around the lower level sandboxing tool https://github.com/containers/bubblewrap.

Selective bubblewrapping This script, called `dev` and included below, builds a new filesystem namespace with minimal `/proc` and `/dev` directories, it s own `/tmp` directories. It then binds-mound some directories to make the host s NixOS system available inside the container (`/bin`, `/usr`, the nix store including domain socket, stuff for OpenGL applications). My user s home directory is taken from `~/.dev-home` and some configuration files are bind-mounted for convenient sharing. I intentionally don t share most of the configuration for example, a `direnv enable` in the dev environment should not affect the main environment. The X11 socket for graphical applications and the corresponding `.Xauthority` file is made available. And finally, if I run `dev` in a project directory, this project directory is bind mounted writable, and the current working directory is preserved. The effect is that I can type `dev` on the command line to enter dev mode rather conveniently. I can run development tools, including graphical ones like VSCode, and especially the latter with its extensions is part of the sandbox. To do a `git push` I either exit the development environment (Ctrl-D) or open a separate terminal. Overall, the inconvenience of switching back and forth seems worth the extra protection. Clearly, isn t going to hold against a determined and maybe targeted attacker (e.g. access to the X11 and the nix daemon socket can probably be used to escape easily). But I hope it will help against a compromised dev dependency that just deletes or exfiltrates data, like keys or passwords, from the usual places in `$HOME`.

Rough corners There is more polishing that could be done.

In particular, clicking on a link inside VSCode in the container will currently open Firefox inside the container, without access to my settings and cookies etc. Ideally, links would be opened in the Firefox running outside. This is a problem that has a solution in the world of applications that are sandboxed with Flatpak, and involves a bunch of moving parts (a xdg-desktop-portal user service, a filtering dbus proxy, exposing access to that proxy in the container). I experimented with that for a bit longer than I should have, but could not get it to work to satisfaction (even without a container involved, I could not get `xdg-desktop-portal` to heed my default browser settings ). For now I will live with manually copying and pasting URLs, we ll see how long this lasts.

With this setup (and unlike the NixOS container setup I tried first), the same applications are installed inside and outside. It might be useful to separate the set of installed programs: There is simply no point in running `evolution` or `firefox` inside the container, and if I do not even have VSCode or `cabal` available outside, so that it s less likely that I forget to enter `dev` before using these tools. It shouldn t be too hard to cargo-cult some of the NixOS Containers infrastructure to be able to have a separate system configuration that I can manage as part of my normal system configuration and make available to `bubblewrap` here.

So likely I will refine this some more over time. Or get tired of typing `dev` and going back to what I did before

The script

The dev script (at the time of writing)

#!/usr/bin/env bash

extra=()
if [[ "$PWD" == /home/jojo/build/* ]]   [[ "$PWD" == /home/jojo/projekte/programming/* ]]
then
extra+=(--bind "$PWD" "$PWD" --chdir "$PWD")
fi

if [ -n "$1" ]
then
    cmd=( "$@" )
else
    cmd=( bash )
fi

# Caveats:
# * access to all of  /etc 
# * access to  /nix/var/nix/daemon-socket/socket , and is trusted user (but needed to run nix)
# * access to X11

exec bwrap \
  --unshare-all \
\
 # blank slate  \
  --share-net \
  --proc /proc \
  --dev /dev \
  --tmpfs /tmp \
  --tmpfs /run/user/1000 \
\
 # Needed for GLX applications, in paticular alacritty  \
  --dev-bind /dev/dri /dev/dri \
  --ro-bind /sys/dev/char /sys/dev/char \
  --ro-bind /sys/devices/pci0000:00 /sys/devices/pci0000:00 \
  --ro-bind /run/opengl-driver /run/opengl-driver \
\
  --ro-bind /bin /bin \
  --ro-bind /usr /usr \
  --ro-bind /run/current-system /run/current-system \
  --ro-bind /nix /nix \
  --ro-bind /etc /etc \
  --ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
\
  --bind ~/.dev-home /home/jojo \
  --ro-bind ~/.config/alacritty ~/.config/alacritty  \
  --ro-bind ~/.config/nvim ~/.config/nvim  \
  --ro-bind ~/.local/share/nvim ~/.local/share/nvim  \
  --ro-bind ~/.bin ~/.bin \
\
  --bind /tmp/.X11-unix/X0 /tmp/.X11-unix/X0 \
  --bind ~/.Xauthority ~/.Xauthority \
  --setenv DISPLAY :0 \
\
  --setenv container dev \
  "$ extra[@] " \
  -- \
  "$ cmd[@] "

9 March 2024

Reproducible Builds: Reproducible Builds in February 2024

Welcome to the February 2024 report from the Reproducible Builds project! In our reports, we try to outline what we have been up to over the past month as well as mentioning some of the important things happening in software supply-chain security.

Reproducible Builds at FOSDEM 2024 Core Reproducible Builds developer Holger Levsen presented at the main track at FOSDEM on Saturday 3rd February this year in Brussels, Belgium. However, that wasn t the only talk related to Reproducible Builds. However, please see our comprehensive FOSDEM 2024 news post for the full details and links.

Maintainer Perspectives on Open Source Software Security Bernhard M. Wiedemann spotted that a recent report entitled Maintainer Perspectives on Open Source Software Security written by Stephen Hendrick and Ashwin Ramaswami of the Linux Foundation sports an infographic which mentions that 56% of [polled] projects support reproducible builds .

Three new reproducibility-related academic papers A total of three separate scholarly papers related to Reproducible Builds have appeared this month: Signing in Four Public Software Package Registries: Quantity, Quality, and Influencing Factors by Taylor R. Schorlemmer, Kelechi G. Kalu, Luke Chigges, Kyung Myung Ko, Eman Abdul-Muhd, Abu Ishgair, Saurabh Bagchi, Santiago Torres-Arias and James C. Davis (Purdue University, Indiana, USA) is concerned with the problem that:
Package maintainers can guarantee package authorship through software signing [but] it is unclear how common this practice is, and whether the resulting signatures are created properly. Prior work has provided raw data on signing practices, but measured single platforms, did not consider time, and did not provide insight on factors that may influence signing. We lack a comprehensive, multi-platform understanding of signing adoption and relevant factors. This study addresses this gap. (arXiv, full PDF)

Reproducibility of Build Environments through Space and Time by Julien Malka, Stefano Zacchiroli and Th o Zimmermann (Institut Polytechnique de Paris, France) addresses:
[The] principle of reusability [ ] makes it harder to reproduce projects build environments, even though reproducibility of build environments is essential for collaboration, maintenance and component lifetime. In this work, we argue that functional package managers provide the tooling to make build environments reproducible in space and time, and we produce a preliminary evaluation to justify this claim.
The abstract continues with the claim that Using historical data, we show that we are able to reproduce build environments of about 7 million Nix packages, and to rebuild 99.94% of the 14 thousand packages from a 6-year-old Nixpkgs revision. (arXiv, full PDF)
Options Matter: Documenting and Fixing Non-Reproducible Builds in Highly-Configurable Systems by Georges Aaron Randrianaina, Djamel Eddine Khelladi, Olivier Zendra and Mathieu Acher (Inria centre at Rennes University, France):
This paper thus proposes an approach to automatically identify configuration options causing non-reproducibility of builds. It begins by building a set of builds in order to detect non-reproducible ones through binary comparison. We then develop automated techniques that combine statistical learning with symbolic reasoning to analyze over 20,000 configuration options. Our methods are designed to both detect options causing non-reproducibility, and remedy non-reproducible configurations, two tasks that are challenging and costly to perform manually. (HAL Portal, full PDF)

Mailing list highlights From our mailing list this month:

User cen posted a query asking How to verify a package by rebuilding it locally on Debian which received a followup from Vagrant Cascadian.

James Addison asked Two questions about build-path reproducibility in Debian regarding the differences in the testing performed by Debian s GitLab continuous integration (CI) pipeline and the Debian-specific testing performed by the Reproducible Builds project itself, and followed this with a separate but related question regarding misconfigured reprotest configurations.

Distribution work In Debian this month, 5 reviews of Debian packages were added, 22 were updated and 8 were removed this month adding to Debian s knowledge about identified issues. A number of issue types were updated as well. [ ][ ][ ][ ] In addition, Roland Clobus posted his 23rd update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) and that there will likely be further work at a MiniDebCamp in Hamburg. Furthermore, Roland also responded in-depth to a query about a previous report
Fedora developer Zbigniew J drzejewski-Szmek announced a work-in-progress script called `fedora-repro-build` that attempts to reproduce an existing package within a koji build environment. Although the projects `README` file lists a number of fields will always or almost always vary and there is a non-zero list of other known issues, this is an excellent first step towards full Fedora reproducibility.
Jelle van der Waa introduced a new linter rule for Arch Linux packages in order to detect cache files leftover by the Sphinx documentation generator which are unreproducible by nature and should not be packaged. At the time of writing, 7 packages in the Arch repository are affected by this.
Elsewhere, Bernhard M. Wiedemann posted another monthly update for his work elsewhere in openSUSE.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions `256`, `257` and `258` to Debian and made the following additional changes:

Use a deterministic name instead of trusting `gpg` s use-embedded-filenames. Many thanks to Daniel Kahn Gillmor dkg@debian.org for reporting this issue and providing feedback. [ ][ ]

Don t error-out with a traceback if we encounter `struct.unpack`-related errors when parsing Python `.pyc` files. (#1064973). [ ]

Don t try and compare `rdb_expected_diff` on non-GNU systems as `%p` formatting can vary, especially with respect to MacOS. [ ]

Fix compatibility with `pytest` 8.0. [ ]

Temporarily fix support for Python 3.11.8. [ ]

Use the `7zip` package (over `p7zip-full`) after a Debian package transition. (#1063559). [ ]

Bump the minimum Black source code reformatter requirement to 24.1.1+. [ ]

Expand an older changelog entry with a CVE reference. [ ]

Make `test_zip` black clean. [ ]

In addition, James Addison contributed a patch to parse the headers from the `diff(1)` correctly [ ][ ] thanks! And lastly, Vagrant Cascadian pushed updates in GNU Guix for diffoscope to version 255, 256, and 258, and updated trydiffoscope to 67.0.6.

reprotest reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes, including:

Create a (working) proof of concept for enabling a specific number of CPUs. [ ][ ]

Consistently use 398 days for time variation rather than choosing randomly and update `README.rst` to match. [ ][ ]

Support a new `--vary=build_path.path` option. [ ][ ][ ][ ]

Website updates There were made a number of improvements to our website this month, including:

Chris Lamb:

Improve the relative sizing of headers. [ ]

Re-order and punch up the introduction and documentation on the `SOURCE_DATE_EPOCH` page. [ ]

Update `SOURCE_DATE_EPOCH` documentation re. `datetime.datetime.fromtimestamp`. Thanks, James Addison. [ ]

Add a post about Reproducible Builds at FOSDEM 2024. [ ]

Holger Levsen:

Update the GNU Guix page to include their reproducibility QA page. [ ]

Add Sune Vuorela and Jan-Benedict Glaw to our contributors list. [ ][ ]

Mattia Rizzolo:

Add Sovereign Tech Fund s logo to our sponsors. [ ]

Update our sponsors list. [ ]

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In February, a number of changes were made by Holger Levsen:

Debian-related changes:

Temporarily disable upgrading/bootstrapping Debian unstable and experimental as they are currently broken. [ ][ ]

Use the 64-bit `amd64` kernel on all `i386` nodes; no more 686 PAE kernels. [ ]

Add an Erlang package set. [ ]

Other changes:

Grant Jan-Benedict Glaw shell access to the Jenkins node. [ ]

Enable debugging for NetBSD reproducibility testing. [ ]

Use `/usr/bin/du --apparent-size` in the Jenkins shell monitor. [ ]

Revert reproducible nodes: mark osuosl2 as down . [ ]

Thanks again to Codethink, for they have doubled the RAM on our `arm64` nodes. [ ]

Only set `/proc/$pid/oom_score_adj` to -1000 if it has not already been done. [ ]

Add the `opemwrt-target-tegra` and `jtx` task to the list of zombie jobs. [ ][ ]

Vagrant Cascadian also made the following changes:

Overhaul the handling of OpenSSH configuration files after updating from Debian bookworm. [ ][ ][ ]

Add two new `armhf` architecture build nodes, `virt32z` and `virt64z`, and insert them into the Munin monitoring. [ ][ ] [ ][ ]

In addition, Alexander Couzens updated the OpenWrt configuration in order to replace the `tegra` target with `mpc85xx` [ ], Jan-Benedict Glaw updated the NetBSD build script to use a separate `$TMPDIR` to mitigate out of space issues on a tmpfs-backed `/tmp` [ ] and Zheng Junjie added a link to the GNU Guix tests [ ]. Lastly, node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Philip Rinn:

`gimagereader` (date)

Bernhard M. Wiedemann:

`grass` (date-related issue)

`grub2` (filesystem ordering issue)

`latex2html` (drop a non-deterministic log)

`mhvtl` (tar)

`obs` (build-tool issue)

`ollama` (GZip embedding the modification time)

`presenterm` (filesystem-ordering issue)

`qt6-quick3d` (parallelism)

Chris Lamb:

#1064506 filed against `geophar`.

#1064891 filed against `pytest-repeat`.

#1064892 filed against `klepto`.

James Addison:

#1064519 filed against `flask-limiter`.

`python-parsl-doc` (disable dynamic argument evaluation by Sphinx `autodoc` extension)

`python3-pytest-repeat` (remove `entry_points.txt` creation that varied by shell)

`python3-selinux` (remove packaged `direct_url.json` file that embeds build path)

`python3-sepolicy` (remove packaged `direct_url.json` file that embeds build path)

#1064575 filed against `pyswarms`.

#1064638 filed against `python-x2go`.

`snapd` (fix timestamp header in packaged manual-page)

`zzzeeksphinx` (existing RB patch forwarded and merged (with modifications))

Johannes Schauer Marin Rodrigues:

#1063939 filed against `fop`.

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mastodon: @reproducible_builds@fosstodon.org

Mailing list: `rb-general@lists.reproducible-builds.org`

7 March 2024

Gunnar Wolf: Constructed truths truth and knowledge in a post-truth world

This post is a review for Computing Reviews for Constructed truths truth and knowledge in a post-truth world , a book published in Springer Link

Many of us grew up used to having some news sources we could implicitly trust, such as well-positioned newspapers and radio or TV news programs. We knew they would only hire responsible journalists rather than risk diluting public trust and losing their brand s value. However, with the advent of the Internet and social media, we are witnessing what has been termed the post-truth phenomenon. The undeniable freedom that horizontal communication has given us automatically brings with it the emergence of filter bubbles and echo chambers, and truth seems to become a group belief. Contrary to my original expectations, the core topic of the book is not about how current-day media brings about post-truth mindsets. Instead it goes into a much deeper philosophical debate: What is truth? Does truth exist by itself, objectively, or is it a social construct? If activists with different political leanings debate a given subject, is it even possible for them to understand the same points for debate, or do they truly experience parallel realities? The author wrote this book clearly prompted by the unprecedented events that took place in 2020, as the COVID-19 crisis forced humanity into isolation and online communication. Donald Trump is explicitly and repeatedly presented throughout the book as an example of an actor that took advantage of the distortions caused by post-truth. The first chapter frames the narrative from the perspective of information flow over the last several decades, on how the emergence of horizontal, uncensored communication free of editorial oversight started empowering the netizens and created a temporary information flow utopia. But soon afterwards, algorithmic gatekeepers started appearing, creating a set of personalized distortions on reality; users started getting news aligned to what they already showed interest in. This led to an increase in polarization and the growth of narrative-framing-specific communities that served as echo chambers for disjoint views on reality. This led to the growth of conspiracy theories and, necessarily, to the science denial and pseudoscience that reached unimaginable peaks during the COVID-19 crisis. Finally, when readers decide based on completely subjective criteria whether a scientific theory such as global warming is true or propaganda, or question what most traditional news outlets present as facts, we face the phenomenon known as fake news. Fake news leads to post-truth, a state where it is impossible to distinguish between truth and falsehood, and serves only a rhetorical function, making rational discourse impossible. Toward the end of the first chapter, the tone of writing quickly turns away from describing developments in the spread of news and facts over the last decades and quickly goes deep into philosophy, into the very thorny subject pursued by said discipline for millennia: How can truth be defined? Can different perspectives bring about different truth values for any given idea? Does truth depend on the observer, on their knowledge of facts, on their moral compass or in their honest opinions? Zoglauer dives into epistemology, following various thinkers ideas on what can be understood as truth: constructivism (whether knowledge and truth values can be learnt by an individual building from their personal experience), objectivity (whether experiences, and thus truth, are universal, or whether they are naturally individual), and whether we can proclaim something to be true when it corresponds to reality. For the final chapter, he dives into the role information and knowledge play in assigning and understanding truth value, as well as the value of second-hand knowledge: Do we really own knowledge because we can look up facts online (even if we carefully check the sources)? Can I, without any medical training, diagnose a sickness and treatment by honestly and carefully looking up its symptoms in medical databases? Wrapping up, while I very much enjoyed reading this book, I must confess it is completely different from what I expected. This book digs much more into the abstract than into information flow in modern society, or the impact on early 2020s politics as its editorial description suggests. At 160 pages, the book is not a heavy read, and Zoglauer s writing style is easy to follow, even across the potentially very deep topics it presents. Its main readership is not necessarily computing practitioners or academics. However, for people trying to better understand epistemology through its expressions in the modern world, it will be a very worthy read.

29 February 2024

Russell Coker: Links February 2024

In 2018 Charles Stross wrote an insightful blog post Dude You Broke the Future [1]. It covers AI in both fiction and fact and corporations (the real AIs) and the horrifying things they can do right now. LongNow has an interesting article about the concept of the Magnum Opus [2]. As an aside I ve been working on SE Linux for 22 years. Cory Doctorow wrote an insightful article about the incentives for enshittification of the Internet and how economic issues and regulations shape that [3]. CCC has a lot of great talks, and this talk from the latest CCC about the Triangulation talk on an attak on Kaspersky iPhones is particularly epic [4]. GoodCar is an online sales site for electric cars in Australia [5]. Ulrike wrote an insightful blog post about how the reliance on volunteer work in the FOSS community hurts diversity [6]. Cory Doctorow wrote an insightful article about The Internet s Original Sin which is misuse of copyright law [7]. He advocates for using copyright strictly for it s intended purpose and creating other laws for privacy, labor rights, etc. David Brin wrote an interesting article on neoteny and sexual selection in humans [8]. 37C3 has an interesting lecture about software licensing for a circular economy which includes environmental savings from better code [9]. Now they track efficiency in KDE bug reports!

26 February 2024

Freexian Collaborators: Long term support for Samba 4.17

Freexian is pleased to announce a partnership with Catalyst to extend the security support of Samba 4.17, which is the version packaged in Debian 12 Bookworm. Samba 4.17 will reach upstream s end-of-support this upcoming March (2024), and the goal of this partnership is to extend it until June 2028 (i.e. the end of Debian 12 s regular security support). One of the main aspects of this project is that it will also include support for Samba as Active Directory Domain Controller (AD-DC). Unfortunately, support for Samba as AD-DC in Debian 11 Bullseye, Debian 10 Buster and older releases has been discontinued before the end of the life cycle of those Debian releases. So we really expect to improve the situation of Samba in Debian 12 Bookworm, ensuring full support during the 5 years of regular security support. We would like to mention that this is an experiment, and we will do our best to make it a success, and to try to continue it for Samba versions included in future Debian releases. Our long term goal is to bring confidence to Samba s upstream development community that they can mark some releases as being supported for 5 years (or more) and that the corresponding work will be funded by companies that benefit from this work (because we would have already built that community). If your company relies on Samba and wants to help sustain LTS versions of Samba, please reach out to us. For companies using Debian, the simplest way is to subscribe to our Debian LTS offer at a gold level (or above) and let us know that you want to contribute to Samba LTS when you send your subscription form. For others, please reach out to us at sales@freexian.com and we will figure out a way to contribute. In the mean time, this project has been possible thanks to the current LTS sponsors and ELTS customers. We hope the whole community of Debian and Samba users will benefit from it. For any question, don t hesitate to contact us.

23 February 2024

Gunnar Wolf: 10 things software developers should learn about learning

This post is a review for Computing Reviews for 10 things software developers should learn about learning , a article published in Communications of the ACM

As software developers, we understand the detailed workings of the different components of our computer systems. And probably due to how computers were presented since their appearance as digital brains in the 1940s we sometimes believe we can transpose that knowledge to how our biological brains work, be it as learners or as problem solvers. This article aims at making the reader understand several mechanisms related to how learning and problem solving actually work in our brains. It focuses on helping expert developers convey knowledge to new learners, as well as learners who need to get up to speed and start coding. The article s narrative revolves around software developers, but much of what it presents can be applied to different problem domains. The article takes this mission through ten points, with roughly the same space given to each of them, starting with wrong assumptions many people have about the similarities between computers and our brains. The first section, Human Memory Is Not Made of Bits, explains the brain processes of remembering as a way of strengthening the force of a memory ( reconsolidation ) and the role of activation in related network pathways. The second section, Human Memory Is Composed of One Limited and One Unlimited System, goes on to explain the organization of memories in the brain between long-term memory (functionally limitless, permanent storage) and working memory (storing little amounts of information used for solving a problem at hand). However, the focus soon shifts to how experience in knowledge leads to different ways of using the same concepts, the importance of going from abstract to concrete knowledge applications and back, and the role of skills repetition over time. Toward the end of the article, the focus shifts from the mechanical act of learning to expertise. Section 6, The Internet Has Not Made Learning Obsolete, emphasizes that problem solving is not just putting together the pieces of a puzzle; searching online for solutions to a problem does not activate the neural pathways that would get fired up otherwise. The final sections tackle the differences that expertise brings to play when teaching or training a newcomer: the same tools that help the beginner s productivity as training wheels will often hamper the expert user s as their knowledge has become automated. The article is written with a very informal and easy-to-read tone and vocabulary, and brings forward several issues that might seem like commonsense but do ring bells when it comes to my own experiences both as a software developer and as a teacher. The article closes by suggesting several books that further expand on the issues it brings forward. While I could not identify a single focus or thesis with which to characterize this article, the several points it makes will likely help readers better understand (and bring forward to consciousness) mental processes often taken for granted, and consider often-overlooked aspects when transmitting knowledge to newcomers.

13 February 2024

Matthew Palmer: Not all TLDs are Created Equal

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now? A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the `https://` in your web browser s location bar). It s the com in `example.com`, or the af in `queer.af`. There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they re very different beasts under the hood.

What s the Difference? Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like com , net , or org , historical oddities like gov , and the new-fangled world of words like tech , social , and bank . These gTLDs are all regulated under a set of rules created and administered by ICANN (the Internet Corporation for Assigned Names and Numbers ), which try to ensure that things aren t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names¹. Country-code TLDs, in contrast, are all two letters long², and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there s nothing anyone can do about it. If that sounds bad, that s because it is. Also, it s not a theoretical problem the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands The `queer.af` cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active. Those running `queer.af` seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn t completed that move when the Taliban came knocking.

The Curious Case of Brexit When the United Kingdom decided to leave the European Union, it fell foul of the EU s rules for the registration of domains under the eu ccTLD³. To register (and maintain) a domain name ending in `.eu`, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents. Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf). In any event, all very unpleasant for everyone involved.

Geopolitics on the Internet?!? After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn t going to do that, because it wouldn t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs. Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren t directly related to the Internet can have grave consequences for your domain name if it s registered under a ccTLD. I don t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don t own that domain name⁴ you re only renting it. When the registration period comes to an end, you have to renew the domain name, or you ll cease to be able to control it. Given that a domain name is typically your brand or identity online, the chances are you d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic it s all a gigantic hassle. For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there s some degree of control over price gouging. On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of `.sc` domain names from US$25 to US$75. No reason, no warning, just pay up .

Who Is Even Getting That Money? A closely related concern about ccTLDs is that some of the cool ones are assigned to countries that are not great. The poster child for this is almost certainly Libya, which has the ccTLD ly . While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in `.ly`. These domain registrations weren t (and aren t) cheap, and it s hard to imagine that at least some of that money wasn t going to benefit the Gaddafi regime. Similarly, the British Indian Ocean Territory, which has the io ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of `.io` domains doesn t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government. Again, I m not trying to suggest that all gTLD operators are wonderful people, but it s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful? The answer to that question is an unqualified maybe . I certainly don t think it s a good idea to register a domain under a ccTLD for vanity purposes: because it makes a word, is the same as a file extension you like, or because it looks cool. Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the `.au` namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there s not a lot of guarantee that they won t fall victim to scope creep in the future. Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the `.eu` example shows, living somewhere today is no guarantee you ll still be living there tomorrow, even if you don t move house. In short, I d suggest sticking to gTLDs. They re at least lower risk than ccTLDs.

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

Footnotes

don t make the mistake of thinking that I approve of ICANN or how it operates; it s an omnishambles of poor governance and incomprehensible decision-making.

corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for Codes for the representation of names of countries and their subdivisions , ISO 3166.

yes, the EU is not a country; it s part of the roughly, though not precisely caveat mentioned previously.

despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations.

12 February 2024

Freexian Collaborators: Monthly report about Debian Long Term Support, January 2024 (by Roberto C. S nchez)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In January, 16 contributors have been paid to work on Debian LTS, their reports are available:

Abhijith PA did 14.0h (out of 7.0h assigned and 7.0h from previous period).

Bastien Roucari s did 22.0h (out of 16.0h assigned and 6.0h from previous period).

Ben Hutchings did 14.5h (out of 8.0h assigned and 16.0h from previous period), thus carrying over 9.5h to the next month.

Chris Lamb did 18.0h (out of 18.0h assigned).

Daniel Leidert did 10.0h (out of 10.0h assigned).

Emilio Pozuelo Monfort did 10.0h (out of 14.75h assigned and 27.0h from previous period), thus carrying over 31.75h to the next month.

Guilhem Moulin did 9.75h (out of 25.0h assigned), thus carrying over 15.25h to the next month.

Holger Levsen did 3.5h (out of 12.0h assigned), thus carrying over 8.5h to the next month.

Markus Koschany did 40.0h (out of 40.0h assigned).

Roberto C. S nchez did 8.75h (out of 9.5h assigned and 2.5h from previous period), thus carrying over 3.25h to the next month.

Santiago Ruano Rinc n did 13.5h (out of 8.25h assigned and 7.75h from previous period), thus carrying over 2.5h to the next month.

Sean Whitton did 0.5h (out of 0.25h assigned and 5.75h from previous period), thus carrying over 5.5h to the next month.

Sylvain Beucler did 9.5h (out of 23.25h assigned and 18.5h from previous period), thus carrying over 32.25h to the next month.

Thorsten Alteholz did 14.0h (out of 14.0h assigned).

Tobias Frost did 12.0h (out of 10.25h assigned and 1.75h from previous period).

Utkarsh Gupta did 8.5h (out of 35.75h assigned), thus carrying over 24.75h to the next month.

Evolution of the situation In January, we have released 25 DLAs. A variety of particularly notable packages were updated during January. Among those updates were the Linux kernel (both versions 5.10 and 4.19), mariadb-10.3, openjdk-11, firefox-esr, and thunderbird. In addition to the many other LTS package updates which were released in January, LTS contributors continue their efforts to make impactful contributions both within the Debian community.

Thanks to our sponsors Sponsors that joined recently are in bold.

Platinum sponsors:

TOSHIBA (for 101 months)

Civil Infrastructure Platform (CIP) (for 69 months)

Gold sponsors:

Roche Diagnostics International AG (for 112 months)

Linode (for 106 months)

Babiel GmbH (for 95 months)

Plat Home (for 95 months)

CINECA (for 69 months)

University of Oxford (for 51 months)

Deveryware (for 38 months)

VyOS Inc (for 33 months)

EDF SA (for 22 months)

Silver sponsors:

Domeneshop AS (for 116 months)

Nantes M tropole (for 110 months)

Univention GmbH (for 102 months)

Universit Jean Monnet de St Etienne (for 102 months)

Ribbon Communications, Inc. (for 96 months)

Exonet B.V. (for 86 months)

Leibniz Rechenzentrum (for 80 months)

Minist re de l Europe et des Affaires trang res (for 63 months)

Cloudways by DigitalOcean (for 53 months)

Dinahosting SL (for 51 months)

Bauer Xcel Media Deutschland KG (for 45 months)

Platform.sh SAS (for 45 months)

Moxa Inc. (for 39 months)

sipgate GmbH (for 36 months)

OVH US LLC (for 34 months)

Tilburg University (for 34 months)

GSI Helmholtzzentrum f r Schwerionenforschung GmbH (for 26 months)

Soliton Systems K.K. (for 23 months)

Bronze sponsors:

Evolix (for 117 months)

Seznam.cz, a.s. (for 117 months)

Intevation GmbH (for 114 months)

Linuxhotel GmbH (for 114 months)

Daevel SARL (for 112 months)

Bitfolk LTD (for 111 months)

Megaspace Internet Services GmbH (for 111 months)

Greenbone AG (for 110 months)

NUMLOG (for 110 months)

WinGo AG (for 110 months)

Ecole Centrale de Nantes - LHEEA (for 106 months)

Entr ouvert (for 101 months)

Adfinis AG (for 98 months)

GNI MEDIA (for 93 months)

Laboratoire LEGI - UMR 5519 / CNRS (for 93 months)

Tesorion (for 93 months)

Bearstech (for 84 months)

LiHAS (for 84 months)

Catalyst IT Ltd (for 79 months)

Supagro (for 74 months)

Demarcq SAS (for 73 months)

Universit Grenoble Alpes (for 59 months)

TouchWeb SAS (for 51 months)

SPiN AG (for 48 months)

CoreFiling (for 43 months)

Institut des sciences cognitives Marc Jeannerod (for 38 months)

Observatoire des Sciences de l Univers de Grenoble (for 35 months)

Tem Innovations GmbH (for 30 months)

WordFinder.pro (for 29 months)

CNRS DT INSU R sif (for 28 months)

Alter Way (for 21 months)

Institut Camille Jordan (for 10 months)

11 February 2024

Marco d'Itri: Extending access to the systemd RuntimeDirectory with a POSIX ACL

inn2 uses ephemeral UNIX domain sockets in /run/news/ to communicate with the ctlinnd program. Since the directory is only writeable by the "news" user, other unprivileged users are not able to use the command. I solved this by extending the inn2.service systemd unit with a drop-in file which uses setfacl to give access to my user "md" to the RuntimeDirectory created by systemd. This is the content of /etc/systemd/system/inn2.service.d/md-ctlinnd.conf:

[Service]
# innd will change the permissions of /run/news/ when started: without
# creating it now with mode 0775 then that will change the ACL mask.
RuntimeDirectoryMode=0775
# allow user md to run ctlinnd(8), which creates sockets in /run/news/
ExecStartPost=/usr/bin/setfacl --modify user:md:rwx $RUNTIME_DIRECTORY

The non-obvious issue here is that the innd daemon on startup will change the directory permissions in a way which sets a more restrictive (non group-writeable) ACL mask, and this would make the newly created user ACL ineffective. The solution is to create the directory group-writeable from start. (Beware: this creates a trivial privileges escalation from md to news.)

7 February 2024

Reproducible Builds: Reproducible Builds in January 2024

Welcome to the January 2024 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. If you are interested in contributing to the project, please visit our Contribute page on our website.

How we executed a critical supply chain attack on PyTorch John Stawinski and Adnan Khan published a lengthy blog post detailing how they executed a supply-chain attack against PyTorch, a popular machine learning platform used by titans like Google, Meta, Boeing, and Lockheed Martin :
Our exploit path resulted in the ability to upload malicious PyTorch releases to GitHub, upload releases to [Amazon Web Services], potentially add code to the main repository branch, backdoor PyTorch dependencies the list goes on. In short, it was bad. Quite bad.
The attack pivoted on PyTorch s use of self-hosted runners as well as submitting a pull request to address a trivial typo in the project s `README` file to gain access to repository secrets and API keys that could subsequently be used for malicious purposes.

New Arch Linux forensic filesystem tool On our mailing list this month, long-time Reproducible Builds developer kpcyrd announced a new tool designed to forensically analyse Arch Linux filesystem images. Called `archlinux-userland-fs-cmp`, the tool is supposed to be used from a rescue image (any Linux) with an Arch install mounted to, [for example], `/mnt`. Crucially, however, at no point is any file from the mounted filesystem eval d or otherwise executed. Parsers are written in a memory safe language. More information about the tool can be found on their announcement message, as well as on the tool s homepage. A GIF of the tool in action is also available.

Issues with our `SOURCE_DATE_EPOCH` code? Chris Lamb started a thread on our mailing list summarising some potential problems with the source code snippet the Reproducible Builds project has been using to parse the `SOURCE_DATE_EPOCH` environment variable:
I m not 100% sure who originally wrote this code, but it was probably sometime in the ~2015 era, and it must be in a huge number of codebases by now. Anyway, Alejandro Colomar was working on the shadow security tool and pinged me regarding some potential issues with the code. You can see this conversation here.
Chris ends his message with a request that those with intimate or low-level knowledge of `time_t`, C types, overflows and the various parsing libraries in the C standard library (etc.) contribute with further info.

Distribution updates In Debian this month, Roland Clobus posted another detailed update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) . Additionally 7 of the 8 bookworm images from the official download link build reproducibly at any later time. In addition to this, three reviews of Debian packages were added, 17 were updated and 15 were removed this month adding to our knowledge about identified issues. Elsewhere, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Community updates There were made a number of improvements to our website, including Bernhard M. Wiedemann fixing a number of typos of the term nondeterministic . [ ] and Jan Zerebecki adding a substantial and highly welcome section to our page about `SOURCE_DATE_EPOCH` to document its interaction with distribution rebuilds. [ ].
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions `254` and `255` to Debian but focusing on triaging and/or merging code from other contributors. This included adding support for comparing eXtensible ARchive (.XAR/.PKG) files courtesy of Seth Michael Larson [ ][ ], as well considerable work from Vekhir in order to fix compatibility between various and subtle incompatible versions of the progressbar libraries in Python [ ][ ][ ][ ]. Thanks!

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen:

Debian-related changes:

Reduce the number of `arm64` architecture workers from 24 to 16. [ ]

Use diffoscope from the Debian release being tested again. [ ]

Improve the handling when killing unwanted processes [ ][ ][ ] and be more verbose about it, too [ ].

Don t mark a job as failed if process marked as to-be-killed is already gone. [ ]

Display the architecture of builds that have been running for more than 48 hours. [ ]

Reboot `arm64` nodes when they hit an OOM (out of memory) state. [ ]

Package rescheduling changes:

Reduce IRC notifications to 1 when rescheduling due to package status changes. [ ]

Correctly set `SUDO_USER` when rescheduling packages. [ ]

Automatically reschedule packages regressing to FTBFS (build failure) or FTBR (build success, but unreproducible). [ ]

OpenWrt-related changes:

Install the `python3-dev` and `python3-pyelftools` packages as they are now needed for the `sunxi` target. [ ][ ]

Also install the `libpam0g-dev` which is needed by some OpenWrt hardware targets. [ ]

Misc:

As it s January, set the `real_year` variable to 2024 [ ] and bump various copyright years as well [ ].

Fix a large (!) number of spelling mistakes in various scripts. [ ][ ][ ]

Prevent Squid and Systemd processes from being killed by the kernel s OOM killer. [ ]

Install the `iptables` tool everywhere, else our custom `rc.local` script fails. [ ]

Cleanup the `/srv/workspace/pbuilder` directory on boot. [ ]

Automatically restart Squid if it fails. [ ]

Limit the execution of `chroot-installation` jobs to a maximum of 4 concurrent runs. [ ][ ]

Significant amounts of node maintenance was performed by Holger Levsen (eg. [ ][ ][ ][ ][ ][ ][ ] etc.) and Vagrant Cascadian (eg. [ ][ ][ ][ ][ ][ ][ ][ ]). Indeed, Vagrant Cascadian handled an extended power outage for the network running the Debian `armhf` architecture test infrastructure. This provided the incentive to replace the UPS batteries and consolidate infrastructure to reduce future UPS load. [ ] Elsewhere in our infrastructure, however, Holger Levsen also adjusted the email configuration for `@reproducible-builds.org` to deal with a new SMTP email attack. [ ]

Upstream patches The Reproducible Builds project tries to detects, dissects and fix as many (currently) unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`cython` (nondeterminstic path issue)

`deluge` (issue with modification time of `.egg` file)

`gap-ferret`, `gap-semigroups` & `gap-simpcomp` (nondeterministic `config.log` file)

`grpc` (filesystem ordering issue )

`hub` (random)

`kubernetes1.22` & `kubernetes1.23` (sort-related issue)

`kubernetes1.24` & `kubernetes1.25` (`go -trimpath` vs random issue)

`libjcat` (drop test files with random bytes)

`luajit` (Use new `d` option for deterministic bytecode output)

`meson` [ ][ ] (sort the results from Python filesystem call)

`python-rjsmin` (drop GCC instrumentation artifacts)

`qt6-virtualkeyboard+others` (bug parallelism/race)

`SoapySDR` (parallelism-related issue)

`systemd` (sorting problem)

`warewulf` (CPIO modification time issue, etc.)

Chris Lamb:

#1060254 filed against `mumble`.

James Addison:

`guake` ( Schroedinger file due to race condition)

`qhelpgenerator-qt5` (timezone localization; fix also merged upstream for QT6)

`sphinx` (search index `doctitle` sorting)

Separate to this, Vagrant Cascadian followed up with the relevant maintainers when reproducibility fixes were not included in newly-uploaded versions of the `mm-common` package in Debian this was quickly fixed, however. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Mailing list: `rb-general@lists.reproducible-builds.org`

Mastodon: @reproducible_builds

Twitter: @ReproBuilds

2 February 2024

Ian Jackson: UPS, the Useless Parcel Service; VAT and fees

I recently had the most astonishingly bad experience with UPS, the courier company. They severely damaged my parcels, and were very bad about UK import VAT, ultimately ending up harassing me on autopilot. The only thing that got their attention was my draft Particulars of Claim for intended legal action. Surprisingly, I got them to admit in writing that the disbursement fee they charge recipients alongside the actual VAT, is just something they made up with no legal basis. What happened Autumn last year I ordered some furniture from a company in Germany. This was to be shipped by them to me by courier. The supplier chose UPS. UPS misrouted one of the three parcels to Denmark. When everything arrived, it had been sat on by elephants. The supplier had to replace most of it, with considerable inconvenience and delay to me, and of course a loss to the supplier. But this post isn t mostly about that. This post is about VAT. You see, import VAT was due, because of fucking Brexit. UPS made a complete hash of collecting that VAT. Their computers can t issue coherent documents, their email helpdesk is completely useless, and their automated debt collection systems run along uninfluenced by any external input. The crazy, including legal threats and escalating late payment fees, continued even after I paid the VAT discrepancy (which I did despite them not yet having provided any coherent calculation for it). This kind of behaviour is a very small and mild version of the kind of things British Gas did to Lisa Ferguson, who eventually won substantial damages for harassment, plus 10K of costs. Having tried asking nicely, and sending stiff letters, I too threatened litigation. I would have actually started a court claim, but it would have included a claim under the Protection from Harassment Act. Those have to be filed under the Part 8 procedure , which involves sending all of the written evidence you re going to use along with the claim form. Collating all that would be a good deal of work, especially since UPS and ControlAccount didn t engage with me at all, so I had no idea which things they might actually dispute. So I decided that before issuing proceedings, I d send them a copy of my draft Particulars of Claim, along with an offer to settle if they would pay me a modest sum and stop being evil robots at me. Rather than me typing the whole tale in again, you can read the full gory details in the PDF of my draft Particulars of Claim. (I ve redacted the reference numbers). Outcome The draft Particulars finally got their attention. UPS sent me an offer: they agreed to pay me 50, in full and final settlement. That was close enough to my offer that I accepted it. I mostly wanted them to stop, and they do seem to have done so. And I ve received the 50. VAT calculation They also finally included an actual explanation of the VAT calculation. It s absurd, but it s not UPS s absurd:

The clearance was entered initially with estimated import charges of 400.03, consisting of 387.83 VAT, and 12.20 disbursement fee. This original entry regrettably did not include the freight cost for calculating the VAT, and as such when submitted for final entry the VAT value was adjusted to include this and an amended invoice was issued for an additional 39.84. HMRC calculate the amount against which VAT is raised using the value of goods, insurance and freight, however they also may apply a VAT adjustment figure. The VAT Adjustment is based on many factors (Incidental costs in regards to a shipment), which includes charge for currency conversion if the invoice does not list values in Sterling, but the main is due to the inland freight from airport of destination to the final delivery point, as this charge varies, for example, from EMA to Edinburgh would be 150, from EMA to Derby would be 1, so each year UPS must supply HMRC with all values incurred for entry build up and they give an average which UPS have to use on the entry build up as the VAT Adjustment. The correct calculation for the import charges is therefore as follows: Goods value divided by exchange rate 2,489.53 EUR / 1.1683 = 2,130.89 GBP Duty: Goods value plus freight (%) 2,130.89 GBP + 5% = 2,237.43 GBP. That total times the duty rate. X 0 % = 0 GBP VAT: Goods value plus freight (100%) 2,130.89 GBP + 0 = 2,130.89 GBP That total plus duty and VAT adjustment 2,130.89 GBP + 0 GBP + 7.49 GBP = 2,348.08 GBP. That total times 20% VAT = 427.67 GBP As detailed above we must confirm that the final VAT charges applied to the shipment were correct, and that no refund of this is therefore due.

This looks very like HMRC-originated nonsense. If only they had put it on the original bills! It s completely ridiculous that it took four months and near-litigation to obtain it. Disbursement fee One more thing. UPS billed me a 12 disbursement fee . When you import something, there s often tax to pay. The courier company pays that to the government, and the consignee pays it to the courier. Usually the courier demands it before final delivery, since otherwise they end up having to chase it as a debt. It is common for parcel companies to add a random fee of their own. As I note in my Particulars, there isn t any legal basis for this. In my own offer of settlement I proposed that UPS should:

State under what principle of English law (such as, what enactment or principle of Common Law), you levy the disbursement fee (or refund it).

To my surprise they actually responded to this in their own settlement letter. (They didn t, for example, mention the harassment at all.) They said (emphasis mine):

A disbursement fee is a fee for amounts paid or processed on behalf of a client. It is an established category of charge used by legal firms, amongst other companies, for billing of various ancillary costs which may be incurred in completion of service. Disbursement fees are not covered by a specific law, nor are they legally prohibited. Regarding UPS disbursement fee this is an administrative charge levied for the use of UPS deferment account to prepay import charges for clearance through CDS. This charge would therefore be billed to the party that is responsible for the import charges, normally the consignee or receiver of the shipment in question. The disbursement fee as applied is legitimate, and as you have stated is a commonly used and recognised charge throughout the courier industry, and I can confirm that this was charged correctly in this instance.

On UPS s analysis, they can just make up whatever fee they like. That is clearly not right (and I don t even need to refer to consumer protection law, which would also make it obviously unlawful). And, that everyone does it doesn t make it lawful. There are so many things that are ubiquitous but unlawful, especially nowadays when much of the legal system - especially consumer protection regulators - has been underfunded to beyond the point of collapse. Next time this comes up I might have a go at getting the fee back. (Obviously I ll have to pay it first, to get my parcel.) ParcelForce and Royal Mail I think this analysis doesn t apply to ParcelForce and (probably) Royal Mail. I looked into this in 2009, and I found that Parcelforce had been given the ability to write their own private laws: Schemes made under section 89 of the Postal Services Act 2000. This is obviously ridiculous but I think it was the law in 2009. I doubt the intervening governments have fixed it. Furniture Oh, yes, the actual furniture. The replacements arrived intact and are great :-).

comments

30 January 2024

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:

/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020

Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:

Issuer Compromised Count

Sectigo 170

ISRG (Let's Encrypt) 161

GoDaddy 141

DigiCert 81

GlobalSign 46

Entrust 3

SSL.com 1

If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Issuer	Compromised Count
Sectigo	170
ISRG (Let's Encrypt)	161
GoDaddy	141
DigiCert	81
GlobalSign	46
Entrust	3
SSL.com	1

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.

Issuer Issuance Volume Compromised Count Compromise Rate

Sectigo 88,323,068 170 1 in 519,547

ISRG (Let's Encrypt) 315,476,402 161 1 in 1,959,480

GoDaddy 56,121,429 141 1 in 398,024

DigiCert 144,713,475 81 1 in 1,786,586

GlobalSign 1,438,485 46 1 in 31,271

Entrust 23,166 3 1 in 7,722

SSL.com 171,816 1 1 in 171,816

If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:

Issuer Issuance Volume Compromised Count Compromise Rate

Entrust 23,166 3 1 in 7,722

GlobalSign 1,438,485 46 1 in 31,271

SSL.com 171,816 1 1 in 171,816

GoDaddy 56,121,429 141 1 in 398,024

Sectigo 88,323,068 170 1 in 519,547

DigiCert 144,713,475 81 1 in 1,786,586

ISRG (Let's Encrypt) 315,476,402 161 1 in 1,959,480

By grouping by order-of-magnitude in the compromise rate, we can identify three bands :

The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.

The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.

The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!

Now we have some useful insights we can think about.

Issuer	Issuance Volume	Compromised Count	Compromise Rate
Sectigo	88,323,068	170	1 in 519,547
ISRG (Let's Encrypt)	315,476,402	161	1 in 1,959,480
GoDaddy	56,121,429	141	1 in 398,024
DigiCert	144,713,475	81	1 in 1,786,586
GlobalSign	1,438,485	46	1 in 31,271
Entrust	23,166	3	1 in 7,722
SSL.com	171,816	1	1 in 171,816

Why Is It So?
If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Sidebar: ACME Does Not Currently Rule The World It is true that all of the organisations in this analysis also provide ACME issuance workflows, should customers desire it. However, the traditional CA companies have been around a lot longer than ACME has, and so they probably acquired many of their customers before ACME existed. Given that it s incredibly hard to get humans to change the way they do things, once they have a way that works , it seems reasonable to assume that most of the certificates issued by these CAs are handled in the same human-centric, error-prone manner they always have been. If organisations would like to refute this assumption, though, by sharing their data on ACME vs legacy issuance rates, I m sure we d all be extremely interested.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:

SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';

The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:

Issuer	Issuance Volume	Compromised Count	Compromise Rate
Entrust	23,166	3	1 in 7,722
GlobalSign	1,438,485	46	1 in 31,271
SSL.com	171,816	1	1 in 171,816
GoDaddy	56,121,429	141	1 in 398,024
"Regular" DigiCert	40,397,363	81	1 in 498,732
Sectigo	88,323,068	170	1 in 519,547
All DigiCert	144,713,475	81	1 in 1,786,586
ISRG (Let's Encrypt)	315,476,402	161	1 in 1,959,480

In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.
The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.

Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.

Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.

Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.

Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?

Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:

Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.

Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.

Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...

If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.

So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?

Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.

Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)

Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:

When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.

When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.

You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

Russell Coker: Links January 2024

Long Now has an insightful article about domestication that considers whether humans have evolved to want to control nature [1]. The OMG Elite hacker cable is an interesting device [2]. A Wifi device in a USB cable to allow remote control and monitoring of data transfer, including remote keyboard control and sniffing. Pity that USB-C cables have chips in them so you can t use a spark to remove unwanted chips from modern cables. David Brin s blog post The core goal of tyrants: The Red-Caesar Cult and a restored era of The Great Man has some insightful points about authoritarianism [3]. Ron Garret wrote an interesting argument against Christianity [4], and a follow-up titled Why I Don t Believe in Jesus [5]. He has a link to a well written article about the different theologies of Jesus and Paul [6]. Dimitri John Ledkov wrote an interesting blog post about how they reduced disk space for Ubuntu kernel packages and RAM for the initramfs phase of boot [7]. I hope this gets copied to Debian soon. Joey Hess wrote an interesting blog post about trying to make LLM systems produce bad code if trained on his code without permission [8]. Arstechnica has an interesting summary of research into the security of fingerprint sensors [9]. Not surprising that the products of the 3 vendors that supply almost all PC fingerprint readers are easy to compromise. Bruce Schneier wrote an insightful blog post about how AI will allow mass spying (as opposed to mass surveillance) [10]. ZDnet has an informative article How to Write Better ChatGPT Prompts in 5 Steps [11]. I sent this to a bunch of my relatives. AbortRetryFail has an interesting article about the Itanic Saga [12]. Erberus sounds interesting, maybe VLIW designs could give a good ration of instructions to power unlike the Itanium which was notorious for being power hungry. Bruce Schneier wrote an insightful article about AI and Trust [13]. We really need laws controlling these things! David Brin wrote an interesting blog post on the obsession with historical cycles [14].

[1] http://tinyurl.com/2xf9wb9k
[2] http://tinyurl.com/2m82ajju
[3] http://tinyurl.com/yrjp5la6
[4] http://tinyurl.com/ytdb24xm
[5] http://tinyurl.com/2x6b7wbd
[6] https://doctrine.org/jesus-vs-paul
[7] http://tinyurl.com/yp3xqtr3
[8] https://joeyh.name/blog/entry/attribution_armored_code/
[9] http://tinyurl.com/ywus4o47
[10] http://tinyurl.com/ypzkjyfv
[11] http://tinyurl.com/yp6ldxlr
[12] https://www.abortretry.fail/p/the-itanic-saga
[13] http://tinyurl.com/yqh9wbfj
[14] http://tinyurl.com/ylue4svh

24 January 2024

Louis-Philippe V ronneau: Montreal Subway Foot Traffic Data, 2023 edition

For the fifth year in a row, I've asked Soci t de Transport de Montr al, Montreal's transit agency, for the foot traffic data of Montreal's subway. By clicking on a subway station, you'll be redirected to a graph of the station's foot traffic.

Licences

The subway map displayed on this page, the original dataset and my modified dataset are licenced under CCO 1.0: they are in the public domain.
The R code I wrote is licensed under the GPLv3+. It's pretty much the same code as last year. I tweaked last's year converter script and although I had to clean some of the data by hand, it worked pretty well.

20 January 2024

Gunnar Wolf: A deep learning technique for intrusion detection system using a recurrent neural networks based framework

This post is a review for Computing Reviews for A deep learning technique for intrusion detection system using a recurrent neural networks based framework , a article published in Computer Communications

So let s assume you already know and understand that artificial intelligence s main building blocks are perceptrons, that is, mathematical models of neurons. And you know that, while a single perceptron is too limited to get interesting information from, very interesting structures neural networks can be built with them. You also understand that neural networks can be trained with large datasets, and you can get them to become quite efficient and accurate classifiers for data comparable to your dataset. Finally, you are interested in applying this knowledge to defensive network security, particularly in choosing the right recurrent neural network (RNN) framework to create an intrusion detection system (IDS). Are you still with me? Good! This paper might be right for you! The paper builds on a robust and well-written introduction and related work sections to arrive at explaining in detail what characterizes a RNN, the focus of this work, among other configurations also known as neural networks, and why they are particularly suited for machine learning (ML) tasks. RNNs must be trained for each problem domain, and publicly available datasets are commonly used for such tasks. The authors present two labeled datasets representing normal and hostile network data, identified according to different criteria: NSL-KDD and UNSW-NB15. They proceed to show a framework to analyze and compare different RNNs and run them against said datasets, segmented for separate training and validation phases, compare results, and finally select the best available model for the task measuring both training speed as well as classification accuracy. The paper is quite heavy due to both its domain-specific terminology many acronyms are used throughout the text and its use of mathematical notation, both to explain specific properties of each of the RNN types and for explaining the preprocessing carried out for feature normalization and selection. This is partly what led me to start the first paragraph by assuming that we, as readers, already understand a large body of material if we are to fully follow the text. The paper does begin by explaining its core technologies, but quickly ramps up and might get too technical for nonexpert readers. It is undeniably an interesting and valuable read, showing the state of the art in IDS and ML-assisted technologies. It does not detail any specific technology applying its findings, but we will probably find the information conveyed here soon enough in industry publications.

Niels Thykier: Making debputy: Writing declarative parsing logic

In this blog post, I will cover how debputy parses its manifest and the conceptual improvements I did to make parsing of the manifest easier. All instructions to debputy are provided via the debian/debputy.manifest file and said manifest is written in the YAML format. After the YAML parser has read the basic file structure, debputy does another pass over the data to extract the information from the basic structure. As an example, the following YAML file:

manifest-version: "0.1"
installations:
  - install:
      source: foo
      dest-dir: usr/bin

would be transformed by the YAML parser into a structure resembling:

 
  "manifest-version": "0.1",
  "installations": [
      
       "install":  
         "source": "foo",
         "dest-dir": "usr/bin",
        
      
  ]

This structure is then what debputy does a pass on to translate this into an even higher level format where the "install" part is translated into an InstallRule. In the original prototype of debputy, I would hand-write functions to extract the data that should be transformed into the internal in-memory high level format. However, it was quite tedious. Especially because I wanted to catch every possible error condition and report "You are missing the required field X at Y" rather than the opaque KeyError: X message that would have been the default. Beyond being tedious, it was also quite error prone. As an example, in debputy/0.1.4 I added support for the install rule and you should allegedly have been able to add a dest-dir: or an as: inside it. Except I crewed up the code and debputy was attempting to look up these keywords from a dict that could never have them. Hand-writing these parsers were so annoying that it demotivated me from making manifest related changes to debputy simply because I did not want to code the parsing logic. When I got this realization, I figured I had to solve this problem better. While reflecting on this, I also considered that I eventually wanted plugins to be able to add vocabulary to the manifest. If the API was "provide a callback to extract the details of whatever the user provided here", then the result would be bad.

Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.

This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.

So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:

It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.

It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.

It had to support "normalization" of the input, such as

           # User provides
           - install: "foo"
           # Which is normalized into:
           - install:
               source: "foo"
4) It must be simple to tell  debputy  what input you expected.

At this point, I remembered that I had seen a Python (PYPI) package where you could give it a TypedDict and an arbitrary input (Sadly, I do not remember the name). The package would then validate the said input against the TypedDict. If the match was successful, you would get the result back casted as the TypedDict. If the match was unsuccessful, the code would raise an error for you. Conceptually, this seemed to be a good starting point for where I wanted to be. Then I looked a bit on the normalization requirement (point 3). What is really going on here is that you have two "schemas" for the input. One is what the programmer will see (the normalized form) and the other is what the user can input (the manifest form). The problem is providing an automatic normalization from the user input to the simplified programmer structure. To expand a bit on the following example:

# User provides
- install: "foo"
# Which is normalized into:
- install:
    source: "foo"

Given that install has the attributes source, sources, dest-dir, as, into, and when, how exactly would you automatically normalize "foo" (str) into source: "foo"? Even if the code filtered by "type" for these attributes, you would end up with at least source, dest-dir, and as as candidates. Turns out that TypedDict actually got this covered. But the Python package was not going in this direction, so I parked it here and started looking into doing my own. At this point, I had a general idea of what I wanted. When defining an extension to the manifest, the plugin would provide debputy with one or two definitions of TypedDict. The first one would be the "parsed" or "target" format, which would be the normalized form that plugin provider wanted to work on. For this example, lets look at an earlier version of the install-examples rule:

# Example input matching this typed dict.
#    
#       "source": ["foo"]
#       "into": ["pkg"]
#    
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

In this form, the install-examples has two attributes - both are list of strings. On the flip side, what the user can input would look something like this:

# Example input matching this typed dict.
#    
#       "source": "foo"
#       "into": "pkg"
#    
#
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
FullInstallExamplesManifestFormat = Union[
    InstallExamplesManifestFormat,
    List[str],
    str,
]

The idea was that the plugin provider would use these two definitions to tell debputy how to parse install-examples. Pseudo-registration code could look something like:

def _handler(
    normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
    ...  # Do something with the normalized form and return an InstallRule.
concept_debputy_api.add_install_rule(
  keyword="install-examples",
  target_form=InstallExamplesTargetFormat,
  manifest_form=FullInstallExamplesManifestFormat,
  handler=_handler,
)

This was my conceptual target and while the current actual API ended up being slightly different, the core concept remains the same.

From concept to basic implementation Building this code is kind like swallowing an elephant. There was no way I would just sit down and write it from one end to the other. So the first prototype of this did not have all the features it has now. Spoiler warning, these next couple of sections will contain some Python typing details. When reading this, it might be helpful to know things such as Union[str, List[str]] being the Python type for either a str (string) or a List[str] (list of strings). If typing makes your head spin, these sections might less interesting for you. To build this required a lot of playing around with Python's introspection and typing APIs. My very first draft only had one "schema" (the normalized form) and had the following features:

Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.

Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.

This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time. There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:

def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
    return x if isinstance(x, list) else [x]

This basic form basically works for all types (assuming none of the types will have List[List[...]]). The logic for determining when this rule is applicable is slightly more involved. My current code is about 100 lines of Python code that would probably lose most of the casual readers. For the interested, you are looking for _union_narrowing in declarative_parser.py With this, when the manifest format had Union[str, List[str]] and the target format had List[str] the generated parser would silently map a string into a list of strings for the plugin provider. But with that in place, I had covered the basics of what I needed to get started. I was quite excited about this milestone of having my first keyword parsed without handwriting the parser logic (at the expense of writing a more generic parse-generator framework).

Adding the first parse hint With the basic implementation done, I looked at what to do next. As mentioned, at the time sources in the manifest format was Union[str, List[str]] and I considered to split into a source: str and a sources: List[str] on the manifest side while keeping the normalized form as sources: List[str]. I ended up committing to this change and that meant I had to solve the problem getting my parser generator to understand the situation:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

There are two related problems to solve here:

How will the parser generator understand that source should be normalized and then mapped into sources?

Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.

While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:

# For all intents and purposes,  foo  is a string despite all the  Annotated  stuff.
foo: Annotated[str, "hello world"] = "my string here"

The exciting thing is that you can put arbitrary details into the type field and read it out again in your introspection code. Which meant, I could add "parse hints" into the type. Some "quick" prototyping later (a day or so), I got the following to work:

# Map from
#      
#        "source": "foo"  # (or "sources": ["foo"])
#        "into": "pkg"
#      
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
#      
#        "source": ["foo"]
#        "into": ["pkg"]
#      
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

Without me (as a plugin provider) writing a line of code, I can have debputy rename or "merge" attributes from the manifest form into the normalized form. Obviously, this required me (as the debputy maintainer) to write a lot code so other me and future plugin providers did not have to write it.

High level typing At this point, basic normalization between one mapping to another mapping form worked. But one thing irked me with these install rules. The into was a list of strings when the parser handed them over to me. However, I needed to map them to the actual BinaryPackage (for technical reasons). While I felt I was careful with my manual mapping, I knew this was exactly the kind of case where a busy programmer would skip the "is this a known package name" check and some user would typo their package resulting in an opaque KeyError: foo. Side note: "Some user" was me today and I was super glad to see debputy tell me that I had typoed a package name (I would have been more happy if I had remembered to use debputy check-manifest, so I did not have to wait through the upstream part of the build that happened before debhelper passed control to debputy...) I thought adding this feature would be simple enough. It basically needs two things:

Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)

At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.

It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[BinaryPackage, List[BinaryPackage]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[
        Annotated[
            List[BinaryPackage],
            DebputyParseHint.required_when_multi_binary()
        ]
    ]

Code-wise, I still had to check for into being absent and providing a default for that case (that is still true in the current codebase - I will hopefully fix that eventually). But I now had less room for mistakes and a standardized error message when you misspell the package name, which was a plus.

The added side-effect - Introspection A lovely side-effect of all the parsing logic being provided to debputy in a declarative form was that the generated parser snippets had fields containing all expected attributes with their types, which attributes were required, etc. This meant that adding an introspection feature where you can ask debputy "What does an install rule look like?" was quite easy. The code base already knew all of this, so the "hard" part was resolving the input the to concrete rule and then rendering it to the user. I added this feature recently along with the ability to provide online documentation for parser rules. I covered that in more details in my blog post Providing online reference documentation for debputy in case you are interested. :)

Wrapping it up This was a short insight into how debputy parses your input. With this declarative technique:

The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.

It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.

The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.

Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)

In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)

Fran ois Marier: Proper Multicast DNS Handling with NetworkManager and systemd-resolved

Using NetworkManager and systemd-resolved together in Debian bookworm does not work out of the box. The first sign of trouble was these constant messages in my logs:

avahi-daemon[pid]: Host name conflict, retrying with hostname-2

Then I realized that CUPS printer discovery didn't work: my network printer could not be found. Since this discovery now relies on Multicast DNS, it would make sense that both problems are related to an incompatibility between NetworkManager and Avahi.

What didn't work The first attempt I made at fixing this was to look for known bugs in Avahi. Neither of the work-arounds I found worked:

the one proposed in https://github.com/avahi/avahi/issues/117#issuecomment-1651475104:
```
  [publish]
  publish-aaaa-on-ipv4=no
  publish-a-on-ipv6=no
```
nor the one proposed in https://github.com/avahi/avahi/issues/117#issuecomment-442201162:
```
  [server]
  cache-entries-max=0
```

What worked The real problem turned out to be the fact that NetworkManager turns on full mDNS support in systemd-resolved which conflicts with the mDNS support in avahi-daemon. You can see this in the output of resolvectl status:

Global
       Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (enp6s0)
    Current Scopes: DNS mDNS/IPv4 mDNS/IPv6
         Protocols: +DefaultRoute -LLMNR +mDNS -DNSOverTLS
                    DNSSEC=no/unsupported
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1
        DNS Domain: lan

which includes +mDNS for the main network adapter. I initially thought that I could just uninstall avahi-daemon and rely on the systemd-resolved mDNS stack, but it's not actually compatible with CUPS. The solution was to tell NetworkManager to set mDNS to resolve-only mode in systemd-resolved by adding the following to /etc/NetworkManager/conf.d/mdns.conf:

[connection]
connection.mdns=1

leaving /etc/avahi/avahi-daemon.conf to the default Debian configuration.

Verifying the configuration After rebooting, resolvectl status now shows the following:

Global
       Protocols: -LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (enp6s0)
    Current Scopes: DNS mDNS/IPv4 mDNS/IPv6
         Protocols: +DefaultRoute -LLMNR mDNS=resolve -DNSOverTLS
                    DNSSEC=no/unsupported
Current DNS Server: 192.168.1.1
       DNS Servers: 192.168.1.1
        DNS Domain: lan

Avahi finally sees my printer (called hp in the output below):

$ avahi-browse -at   grep Printer
+ enp6s0 IPv6 hp @ myprintserver   Secure Internet Printer local
+ enp6s0 IPv4 hp @ myprintserver   Secure Internet Printer local
+ enp6s0 IPv6 hp @ myprintserver   Internet Printer        local
+ enp6s0 IPv4 hp @ myprintserver   Internet Printer        local
+ enp6s0 IPv6 hp @ myprintserver   UNIX Printer            local
+ enp6s0 IPv4 hp @ myprintserver   UNIX Printer            local

and so does CUPS:

$ sudo lpinfo --include-schemes dnssd -v
network dnssd://myprintserver%20%40%20hp._ipp._tcp.local/cups?uuid=d46942a2-b730-11ee-b05c-a75251a34287

Firewall rules Since printer discovery in CUPS relies on mDNS, another thing to double-check is that the correct ports are open on the firewall. This is what I have in /etc/network/iptables.up.rules:

# Allow mDNS for local service discovery
-A INPUT -d 100.64.0.0/10 -p udp --dport 5353 -j ACCEPT
-A INPUT -d 192.168.1.0/24 -p udp --dport 5353 -j ACCEPT

and in etc/network/ip6tables.up.rules:

# Allow mDNS for local service discovery
-A INPUT -d ff02::/16 -p udp --dport 5353 -j ACCEPT

Next.

Previous.